Part 1: Brief description of the problem and data

1.1 Problem

A generative adversarial network (GAN) is a generative model that defines an adversarial net framework and is composed of two CNN models, namely a generator and a discriminator, with the goal of generating new realistic images when given a set of training images. These two models act as adversaries of each other: the generator learns to generate new fake images that look like real images (starting with random noise) while the discriminator learns to determine whether a sample image is a real or a fake image. The two models are trained together in a zero-sum game, adversarially, and overtime, the generator gets better at generating images that are super close to real images and discriminator gets better at differentiating them. The process reaches equilibrium when the discriminator can no longer distinguish real images from fakes.

image.png Source: Google

In this project, we will build and train a Deep Convolutional Generative Adversarial Network (DCGAN) with Keras to generate images of Monet-style.

DCGAN is a type of GAN that uses convolutional neural networks (CNNs) as the generator and discriminator. CNNs are specifically designed for image recognition tasks and are well-suited for generating images using GANs. DCGAN includes several architectural changes compared to a regular GAN. It uses transposed convolutional layers for the generator instead of fully connected layers, and replaces pooling layers with strided convolutions. It also uses batch normalization to stabilize the training process and prevents the generator from collapsing.

image.png Source: towardsdatascience.com

As we can see, the Discriminator model is just a Convolutional classification model. In contrast, the Generator model is more complex as it learns to convert latent inputs into an actual image with the help of Transposed and regular Convolutions. In summary, while both GAN and DCGAN are used for generating new data, DCGAN specifically uses convolutional neural networks as the generator and discriminator, and includes several architectural changes to improve the stability and quality of generated data.

There are 4 major steps in the training:

  1. Build the generator.
  2. Build the discriminator.
  3. Define Loss Functions & Optimizers.
  4. Define the training loop & Visualize Images.

1.2 Data

In this project, I use a dataset from Kaggle, was downloaded from the link:
https://www.kaggle.com/competitions/gan-getting-started/data

The dataset contains four directories: monet_tfrec, photo_tfrec, monet_jpg, and photo_jpg. The monet_tfrec and monet_jpg directories contain the same painting images, and the photo_tfrec and photo_jpg directories contain the same photos.

The monet directories contain Monet paintings. We will use these images to train our model.

The photo directories contain photos. We will add Monet-style to these images and submit our generated jpeg images as a zip file.

Files
monet_jpg - 300 Monet paintings sized 256x256 in JPEG format
monet_tfrec - 300 Monet paintings sized 256x256 in TFRecord format
photo_jpg - 7028 photos sized 256x256 in JPEG format
photo_tfrec - 7028 photos sized 256x256 in TFRecord format

Reference Sources:
https://www.kaggle.com/code/amyjang/monet-cyclegan-tutorial/notebook
https://www.tensorflow.org/tutorials/generative/dcgan
https://towardsdatascience.com/cgan-conditional-generative-adversarial-network-how-to-gain-control-over-gan-outputs-b30620bd0cc8

Part 2: Exploratory Data Analysis (EDA)

Load in the data

Load in the data by following the Monet CycleGAN Tutorial.

Part 3: Building and training Deep Convolutional Generative Adversarial Network (DCGAN)

DC GAN is one of the most used, powerful, and successful types of GAN architecture. It is implemented with help of ConvNets in place of a Multi-layered perceptron. The ConvNets use a convolutional stride and are built without max pooling and layers in this network are not completely connected.

3.1 Build the Generator

The generator network takes random Gaussian noise and maps it into input images such that the discriminator cannot tell which images came from the dataset and which images came from the generator.

image.png

Let’s define our generator model architect:
The generator uses tf.keras.layers.Conv2DTranspose (upsampling) layers to produce an image from a seed (random noise). Start with a Dense layer that takes this seed as input, then upsample several times until we reach the desired image size of 256x256x3. Notice the tf.keras.layers.LeakyReLU activation for each layer, except the output layer which uses tanh.

3.2 Build the Discriminator

The discriminator will be trained to learn to tell the difference between images comes from the dataset and images comes from the generator.

image.png

Let’s now define the model architect:

From the result above, we can see that since the decision is not greater than 0.5 which is closer to 0 so the image is fake.

3.3 Define Loss Functions & Optimizers

3.4 Define DCGAN model with the training loop & Visualize Images

The training loop begins with generator receiving a random noise as input. That noise is used to produce an image. The discriminator is then used to classify real images (drawn from the training set) and fakes images (produced by the generator). The loss is calculated for each of these models, and the gradients are used to update the generator and discriminator.

Here, we create a DCGAN_model which is comprised of:

+ train() is a function that performs the training for the generator and discriminator.    
+ generate_images() is a function to generate images from noise using generator.   
+ generate_and_plot_images() function generates images from the generator and visualize them.   
+ train_loop(): finally, we will loop that alternates between training the generator and discriminator for a given number of epochs, print running time and mean loss for every 200 epochs.  

Using the tf.function() to improve the performance of TensorFlow code.

3.5 Tuning model

To tune DCGAN model, instead of using LeakyReLU activation: alpha=0.32 as above, now, I would like to create new two generators with LeakyReLU activation: alpha=0.3 and alpha=0.4.

3.5.1 DCGAN Model 2:

Architecture:

3.5.2 DCGAN Model 3:

Architecture:

Compare three models:

Looking at the result above, we con conclude that in this case, DCGAN 1 model with the Generator of LeakyReLU activation: alpha=0.2 has the best performance with the lowest mean loss. Thus, I will use DCGAN 1 to generate 7000 images of Monet-style.

Part 4: Submit images

Part 5: Conclusion and Takeaways

The goal of this project is to generate 7000 images of monet-style using DCGAN models. There are 5 parts:
(1) Brief description of the problem and data
(2) Exploratory Data Analysis
(3) Building and training DCGAN model
(4) Result and Analysis
(5) Conclusion and Takeaways

DCGAN, or Deep Convolutional Generative Adversarial Networks, is a type of generative model that can learn to generate new images by training on a dataset of existing images. DCGANs have shown impressive results in generating realistic images of faces, animals, landscapes, and other objects. Training a DCGAN model requires a significant amount of computational resources and can take a long time, depending on the size of the dataset and the complexity of the model. It is also important to carefully tune the hyperparameters to achieve the best possible results such as:

+ add more layers and different types of layers and see the effect on the training time and the stability of the training   
+ change the number of filters     
+ adjust the activation functions    
+ adjust the learning rate: a high learning rate can cause the model to overshoot the optimal weights, while a low learning rate can result in slow convergence.  
+ add regularization techniques such as dropout, weight decay, or spectral normalization can be used to reduce overfitting and improve the generalization performance of the DCGAN.